Pulsar timing arrays (PTAs) perform Bayesian posterior inference with
expensive MCMC methods. Given a dataset of ~10-100 pulsars and O(10^3) timing
residuals each, producing a posterior distribution for the stochastic
gravitational wave background (SGWB) can take days to a week. The computational
bottleneck arises because the likelihood evaluation required for MCMC is
extremely costly when considering the dimensionality of the search space.
Fortunately, generating simulated data is fast, so modern simulation-based
inference techniques can be brought to bear on the problem. In this paper, we
demonstrate how conditional normalizing flows trained on simulated data can be
used for extremely fast and accurate estimation of the SGWB posteriors,
reducing the sampling time from weeks to a matter of seconds.
( 2
min )
Randomized experimental comparisons of alternative pedagogical strategies
could provide useful empirical evidence in instructors' decision-making.
However, traditional experiments do not have a clear and simple pathway to
using data rapidly to try to increase the chances that students in an
experiment get the best conditions. Drawing inspiration from the use of machine
learning and experimentation in product development at leading technology
companies, we explore how adaptive experimentation might help in continuous
course improvement. In adaptive experiments, as different arms/conditions are
deployed to students, data is analyzed and used to change the experience for
future students. This can be done using machine learning algorithms to identify
which actions are more promising for improving student experience or outcomes.
This algorithm can then dynamically deploy the most effective conditions to
future students, resulting in better support for students' needs. We illustrate
the approach with a case study providing a side-by-side comparison of
traditional and adaptive experimentation of self-explanation prompts in online
homework problems in a CS1 course. This provides a first step in exploring the
future of how this methodology can be useful in bridging research and practice
in doing continuous improvement.
( 2
min )
Graph neural networks (GNNs) have gained significant popularity due to the
powerful capability to extract useful representations from graph data. As the
need for efficient GNN computation intensifies, a variety of programming
abstractions designed for optimizing GNN Aggregation have emerged to facilitate
acceleration. However, there is no comprehensive evaluation and analysis upon
existing abstractions, thus no clear consensus on which approach is better. In
this letter, we classify existing programming abstractions for GNN Aggregation
by the dimension of data organization and propagation method. By constructing
these abstractions on a state-of-the-art GNN library, we perform a thorough and
detailed characterization study to compare their performance and efficiency,
and provide several insights on future GNN acceleration based on our analysis.
( 2
min )
The performance of neural networks has been significantly improved by
increasing the number of channels in convolutional layers. However, this
increase in performance comes with a higher computational cost, resulting in
numerous studies focused on reducing it. One promising approach to address this
issue is group convolution, which effectively reduces the computational cost by
grouping channels. However, to the best of our knowledge, there has been no
theoretical analysis on how well the group convolution approximates the
standard convolution. In this paper, we mathematically analyze the
approximation of the group convolution to the standard convolution with respect
to the number of groups. Furthermore, we propose a novel variant of the group
convolution called balanced group convolution, which shows a higher
approximation with a small additional computational cost. We provide
experimental results that validate our theoretical findings and demonstrate the
superior performance of the balanced group convolution over other variants of
group convolution.
( 2
min )
Molecular language modeling is an effective approach to generating novel
chemical structures. However, these models do not \emph{a priori} encode
certain preferences a chemist may desire. We investigate the use of fine-tuning
using Direct Preference Optimization to better align generated molecules with
chemist preferences. Our findings suggest that this approach is simple,
efficient, and highly effective.
( 2
min )
On-device machine learning (ML) enables the training process to exploit a
massive amount of user-generated private data samples. To enjoy this benefit,
inter-device communication overhead should be minimized. With this end, we
propose federated distillation (FD), a distributed model training algorithm
whose communication payload size is much smaller than a benchmark scheme,
federated learning (FL), particularly when the model size is large. Moreover,
user-generated data samples are likely to become non-IID across devices, which
commonly degrades the performance compared to the case with an IID dataset. To
cope with this, we propose federated augmentation (FAug), where each device
collectively trains a generative model, and thereby augments its local data
towards yielding an IID dataset. Empirical studies demonstrate that FD with
FAug yields around 26x less communication overhead while achieving 95-98% test
accuracy compared to FL.
( 2
min )
This work introduces the first toolkit around path-norms that is fully able
to encompass general DAG ReLU networks with biases, skip connections and any
operation based on the extraction of order statistics: max pooling, GroupSort
etc. This toolkit notably allows us to establish generalization bounds for
modern neural networks that are not only the most widely applicable path-norm
based ones, but also recover or beat the sharpest known bounds of this type.
These extended path-norms further enjoy the usual benefits of path-norms: ease
of computation, invariance under the symmetries of the network, and improved
sharpness on feedforward networks compared to the product of operators' norms,
another complexity measure most commonly used.
The versatility of the toolkit and its ease of implementation allow us to
challenge the concrete promises of path-norm-based generalization bounds, by
numerically evaluating the sharpest known bounds for ResNets on ImageNet.
( 2
min )
A metric tensor for Riemann manifold Monte Carlo particularly suited for
non-linear Bayesian hierarchical models is proposed. The metric tensor is built
from symmetric positive semidefinite log-density gradient covariance (LGC)
matrices, which are also proposed and further explored here. The LGCs
generalize the Fisher information matrix by measuring the joint information
content and dependence structure of both a random variable and the parameters
of said variable. Consequently, positive definite Fisher/LGC-based metric
tensors may be constructed not only from the observation likelihoods as is
current practice, but also from arbitrarily complicated non-linear prior/latent
variable structures, provided the LGC may be derived for each conditional
distribution used to construct said structures. The proposed methodology is
highly automatic and allows for exploitation of any sparsity associated with
the model in question. When implemented in conjunction with a Riemann manifold
variant of the recently proposed numerical generalized randomized Hamiltonian
Monte Carlo processes, the proposed methodology is highly competitive, in
particular for the more challenging target distributions associated with
Bayesian hierarchical models.
( 2
min )
Inference on modern Bayesian Neural Networks (BNNs) often relies on a
variational inference treatment, imposing violated assumptions of independence
and the form of the posterior. Traditional MCMC approaches avoid these
assumptions at the cost of increased computation due to its incompatibility to
subsampling of the likelihood. New Piecewise Deterministic Markov Process
(PDMP) samplers permit subsampling, though introduce a model specific
inhomogenous Poisson Process (IPPs) which is difficult to sample from. This
work introduces a new generic and adaptive thinning scheme for sampling from
these IPPs, and demonstrates how this approach can accelerate the application
of PDMPs for inference in BNNs. Experimentation illustrates how inference with
these methods is computationally feasible, can improve predictive accuracy,
MCMC mixing performance, and provide informative uncertainty measurements when
compared against other approximate inference schemes.
( 2
min )
Compressing neural networks is a key step when deploying models for real-time
or embedded applications. Factorizing the model's matrices using low-rank
approximations is a promising method for achieving compression. While it is
possible to set the rank before training, this approach is neither flexible nor
optimal. In this work, we propose a post-training rank-selection method called
Rank-Tuning that selects a different rank for each matrix. Used in combination
with training adaptations, our method achieves high compression rates with no
or little performance degradation. Our numerical experiments on signal
processing tasks show that we can compress recurrent neural networks up to 14x
with at most 1.4% relative performance reduction.
( 2
min )
We study the performance of empirical risk minimization on the $p$-norm
linear regression problem for $p \in (1, \infty)$. We show that, in the
realizable case, under no moment assumptions, and up to a
distribution-dependent constant, $O(d)$ samples are enough to exactly recover
the target. Otherwise, for $p \in [2, \infty)$, and under weak moment
assumptions on the target and the covariates, we prove a high probability
excess risk bound on the empirical risk minimizer whose leading term matches,
up to a constant that depends only on $p$, the asymptotically exact rate. We
extend this result to the case $p \in (1, 2)$ under mild assumptions that
guarantee the existence of the Hessian of the risk at its minimizer.
( 2
min )
We initiate a novel approach to explain the out of sample performance of
random forest (RF) models by exploiting the fact that any RF can be formulated
as an adaptive weighted K nearest-neighbors model. Specifically, we use the
proximity between points in the feature space learned by the RF to re-write
random forest predictions exactly as a weighted average of the target labels of
training data points. This linearity facilitates a local notion of
explainability of RF predictions that generates attributions for any model
prediction across observations in the training set, and thereby complements
established methods like SHAP, which instead generates attributions for a model
prediction across dimensions of the feature space. We demonstrate this approach
in the context of a bond pricing model trained on US corporate bond trades, and
compare our approach to various existing approaches to model explainability.
( 2
min )
Molecular language modeling is an effective approach to generating novel
chemical structures. However, these models do not \emph{a priori} encode
certain preferences a chemist may desire. We investigate the use of fine-tuning
using Direct Preference Optimization to better align generated molecules with
chemist preferences. Our findings suggest that this approach is simple,
efficient, and highly effective.
( 2
min )
Customers of every size and industry are innovating on AWS by infusing machine learning (ML) into their products and services. Recent developments in generative AI models have further sped up the need of ML adoption across industries. However, implementing security, data privacy, and governance controls are still key challenges faced by customers when implementing ML […]
( 16
min )
This is a guest post co-written by Rama Badrinath, Divay Jindal and Utkarsh Agrawal at Meesho. Meesho is India’s fastest growing ecommerce company with a mission to democratize internet commerce for everyone and make it accessible to the next billion users of India. Meesho was founded in 2015 and today focuses on buyers and sellers […]
( 6
min )
GPU-powered surgical-simulation devices are helping train more than 2,000 doctors a year in lower-income countries to treat cataract blindness, the world’s leading cause of blindness, thanks to the nonprofit HelpMeSee. While cataract surgery has a success rate of around 99%, many patients in low- and middle-income countries lack access to the common procedure due to Read article >
( 6
min )
A new AI agent developed by NVIDIA Research that can teach robots complex skills has trained a robotic hand to perform rapid pen-spinning tricks — for the first time as well as a human can. The stunning prestidigitation, showcased in the video above, is one of nearly 30 tasks that robots have learned to expertly Read article >
( 6
min )
Companies increasingly rely on user-generated images and videos for engagement. From ecommerce platforms encouraging customers to share product images to social media companies promoting user-generated videos and images, using user content for engagement is a powerful strategy. However, it can be challenging to ensure that this user-generated content is consistent with your policies and fosters […]
( 7
min )
High-resolution imagery is very prevalent in today’s world, from satellite imagery to drones and DLSR cameras. From this imagery, we can capture damage due to natural disasters, anomalies in manufacturing equipment, or very small defects such as defects on printed circuit boards (PCBs) or semiconductors. Building anomaly detection models using high-resolution imagery can be challenging […]
( 8
min )
Customers increasingly want to use deep learning approaches such as large language models (LLMs) to automate the extraction of data and insights. For many industries, data that is useful for machine learning (ML) may contain personally identifiable information (PII). To ensure customer privacy and maintain regulatory compliance while training, fine-tuning, and using deep learning models, […]
( 12
min )
To enable professionals worldwide to build and run AI applications right from their desktops, NVIDIA and AMD are powering a new line of workstations equipped with NVIDIA RTX Ada Generation GPUs and AMD Ryzen Threadripper PRO 7000 WX-Series CPUs. Bringing together the highest levels of AI computing, rendering and simulation capabilities, these new platforms enable Read article >
( 5
min )
Training generative AI models just got easier. NVIDIA DGX Cloud AI supercomputing platform and NVIDIA AI Enterprise software are now available in Oracle Cloud Marketplace, making it possible for Oracle Cloud Infrastructure customers to access high-performance accelerated computing and software to run secure, stable and supported production AI in just a few clicks. The addition Read article >
( 6
min )
Rush to the cloud — stream Counter-Strike 2 on GeForce NOW for the highest frame rates. Members can play through the newest chapter of Valve’s elite, competitive, first-person shooter from the cloud. It’s all part of an action-packed GFN Thursday, with 22 more games joining the cloud gaming platform’s library, including Hot Wheels Unleashed 2 Read article >
( 5
min )
We developed a safety mitigation stack to ready DALL·E 3 for wider release and are sharing updates on our provenance research.
( 3
min )
AI models that prioritize similarity falter when asked to design something completely new.
( 10
min )
The award honors research on public policy with a focus on economic and governmental reforms.
( 7
min )
Purina US, a subsidiary of Nestlé, has a long history of enabling people to more easily adopt pets through Petfinder, a digital marketplace of over 11,000 animal shelters and rescue groups across the US, Canada, and Mexico. As the leading pet adoption platform, Petfinder has helped millions of pets find their forever homes. Purina consistently […]
( 9
min )
This position research paper was presented at the 26th ACM Conference on Computer-Supported Cooperative Work and Social Computing (opens in new tab) (CSCW 2023), a premier venue for research on the design and use of technologies that affect groups, organizations, and communities. In the business world, measuring success is as critical as selecting the right […]
The post Understanding the user: How the Enterprise System Usability Scale aligns with user reality appeared first on Microsoft Research.
( 10
min )
Powerful generative AI models and cloud-native APIs and microservices are coming to the edge. Generative AI is bringing the power of transformer models and large language models to virtually every industry. That reach now includes areas that touch edge, robotics and logistics systems: defect detection, real-time asset tracking, autonomous planning and navigation, human-robot interactions and Read article >
( 8
min )
Artificial intelligence is now a household term. Responsible AI is hot on its heels. Julia Stoyanovich, associate professor of computer science and engineering at NYU and director of the university’s Center for Responsible AI, wants to make the terms “AI” and “responsible AI” synonymous. In the latest episode of the NVIDIA AI Podcast, host Noah Read article >
( 6
min )
Real-time rendering, animation and texture baking are essential workflows for 3D art production. Using the Marmoset Toolbag software, 3D artists can enhance their creative workflows and build complex 3D models without disruptions to productivity.
( 7
min )
NVIDIA founder and CEO Jensen Huang joined Hon Hai (Foxconn) Chairman and CEO Young Liu to unveil the latest in their ongoing partnership to develop the next wave of intelligent electric vehicle (EV) platforms for the global automotive market. This latest move, announced today at the fourth annual Hon Hai Tech Day in Taiwan, will Read article >
( 6
min )
Amazon Pharmacy is a full-service pharmacy on Amazon.com that offers transparent pricing, clinical and customer support, and free delivery right to your door. Customer care agents play a crucial role in quickly and accurately retrieving information related to pharmacy information, including prescription clarifications and transfer status, order and dispensing details, and patient profile information, in […]
( 8
min )
At Amazon Web Services (AWS), not only are we passionate about providing customers with a variety of comprehensive technical solutions, but we’re also keen on deeply understanding our customers’ business processes. We adopt a third-party perspective and objective judgment to help customers sort out their value propositions, collect pain points, propose appropriate solutions, and create […]
( 16
min )
Amazon Personalize has launched a new integration with Amazon OpenSearch Service that enables you to personalize search results for each user and assists in predicting their search needs. The Amazon Personalize Search Ranking plugin within OpenSearch Service allows you to improve the end-user engagement and conversion from your website and app search by taking advantage […]
( 7
min )
GeForce RTX and NVIDIA RTX GPUs, which are packed with dedicated AI processors called Tensor Cores, are bringing the power of generative AI natively to more than 100 million Windows PCs and workstations.
( 7
min )
NVIDIA today announced an update to RTX Video Super Resolution (VSR) that delivers greater overall graphical fidelity with preserved details, upscaling for native videos and support for GeForce RTX 20 Series GPUs.
( 7
min )
Researchers coaxed a family of generative AI models to work together to solve multistep robot manipulation problems.
( 11
min )
Some researchers see formal specifications as a way for autonomous systems to "explain themselves" to humans. But a new study finds that we aren't understanding.
( 9
min )
Veriff is an identity verification platform partner for innovative growth-driven organizations, including pioneers in financial services, FinTech, crypto, gaming, mobility, and online marketplaces. In this post, we show you how Veriff standardized their model deployment workflow using Amazon SageMaker, reducing costs and development time.
( 8
min )
How trustworthy are generative pre-trained transformer (GPT) models? To answer this question, University of Illinois Urbana-Champaign, together with Stanford University, University of California, Berkeley, Center for AI Safety, and Microsoft Research, released a comprehensive trustworthiness evaluation platform for large language models (LLMs), which is presented in the recent paper: DecodingTrust: A Comprehensive Assessment of Trustworthiness […]
The post DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models appeared first on Microsoft Research.
( 11
min )
Similar to my article series on adversarial robustness, I was planning to have a series on bit errors robustness accompanied by PyTorch code. Instead, due to time constraints, I decided to condense the information into a single article. The code for the originally planned six articles is available on GitHub.
The post Benchmarking Bit Errors in Quantized Neural Networks with PyTorch appeared first on David Stutz.
( 6
min )
Maximum-type statistics of certain functions of the sample covariance matrix
of high-dimensional vector time series are studied to statistically confirm or
reject the null hypothesis that a data set has been collected under normal
conditions. The approach generalizes the case of the maximal deviation of the
sample autocovariances function from its assumed values. Within a linear time
series framework it is shown that Gumbel-type extreme value asymptotics holds
true. As applications we discuss long-only mimimal-variance portfolio
optimization and subportfolio analysis with respect to idiosyncratic risks, ETF
index tracking by sparse tracking portfolios, convolutional deep learners for
image analysis and the analysis of array-of-sensors data.
( 2
min )
The exploration of transition state (TS) geometries is crucial for
elucidating chemical reaction mechanisms and modeling their kinetics. Recently,
machine learning (ML) models have shown remarkable performance for prediction
of TS geometries. However, they require 3D conformations of reactants and
products often with their appropriate orientations as input, which demands
substantial efforts and computational cost. Here, we propose a generative
approach based on the stochastic diffusion method, namely TSDiff, for
prediction of TS geometries just from 2D molecular graphs. TSDiff outperformed
the existing ML models with 3D geometries in terms of both accuracy and
efficiency. Moreover, it enables to sample various TS conformations, because it
learned the distribution of TS geometries for diverse reactions in training.
Thus, TSDiff was able to find more favorable reaction pathways with lower
barrier heights than those in the reference database. These results demonstrate
that TSDiff shows promising potential for an efficient and reliable TS
exploration.
( 2
min )
This paper introduces a novel model-agnostic algorithm called adaptive
ensemble batch multi-input multi-output conformalized quantile regression
(AEnbMIMOCQR} that enables forecasters to generate multi-step ahead prediction
intervals for a fixed pre-specified miscoverage rate in a distribution-free
manner. Our method is grounded on conformal prediction principles, however, it
does not require data splitting and provides close to exact coverage even when
the data is not exchangeable. Moreover, the resulting prediction intervals,
besides being empirically valid along the forecast horizon, do not neglect
heteroscedasticity. AEnbMIMOCQR is designed to be robust to distribution
shifts, which means that its prediction intervals remain reliable over an
unlimited period of time, without entailing retraining or imposing unrealistic
strict assumptions on the data-generating process. Through methodically
experimentation, we demonstrate that our approach outperforms other competitive
methods on both real-world and synthetic datasets. The code used in the
experimental part and a tutorial on how to use AEnbMIMOCQR can be found at the
following GitHub repository: https://github.com/Quilograma/AEnbMIMOCQR.
( 3
min )
Reinforcement Learning (RL) environments can produce training data with
spurious correlations between features due to the amount of training data or
its limited feature coverage. This can lead to RL agents encoding these
misleading correlations in their latent representation, preventing the agent
from generalising if the correlation changes within the environment or when
deployed in the real world. Disentangled representations can improve
robustness, but existing disentanglement techniques that minimise mutual
information between features require independent features, thus they cannot
disentangle correlated features. We propose an auxiliary task for RL algorithms
that learns a disentangled representation of high-dimensional observations with
correlated features by minimising the conditional mutual information between
features in the representation. We demonstrate experimentally, using continuous
control tasks, that our approach improves generalisation under correlation
shifts, as well as improving the training performance of RL algorithms in the
presence of correlated features.
( 2
min )
Hierarchical time series are common in several applied fields. The forecasts
for these time series are required to be coherent, that is, to satisfy the
constraints given by the hierarchy. The most popular technique to enforce
coherence is called reconciliation, which adjusts the base forecasts computed
for each time series. However, recent works on probabilistic reconciliation
present several limitations. In this paper, we propose a new approach based on
conditioning to reconcile any type of forecast distribution. We then introduce
a new algorithm, called Bottom-Up Importance Sampling, to efficiently sample
from the reconciled distribution. It can be used for any base forecast
distribution: discrete, continuous, or in the form of samples, providing a
major speedup compared to the current methods. Experiments on several temporal
hierarchies show a significant improvement over base probabilistic forecasts.
( 2
min )
Neural ordinary differential equations (neural ODEs) are a popular family of
continuous-depth deep learning models. In this work, we consider a large family
of parameterized ODEs with continuous-in-time parameters, which include
time-dependent neural ODEs. We derive a generalization bound for this class by
a Lipschitz-based argument. By leveraging the analogy between neural ODEs and
deep residual networks, our approach yields in particular a generalization
bound for a class of deep residual networks. The bound involves the magnitude
of the difference between successive weight matrices. We illustrate numerically
how this quantity affects the generalization capability of neural networks.
( 2
min )
Literature-Based Discovery (LBD) aims to discover new scientific knowledge by
mining papers and generating hypotheses. Standard LBD is limited to predicting
pairwise relations between discrete concepts (e.g., drug-disease links), and
ignores critical contexts like experimental settings (e.g., a specific patient
population where a drug is evaluated) and background motivations (e.g., to find
drugs without specific side effects). We address these limitations with a novel
formulation of contextualized-LBD (C-LBD): generating scientific hypotheses in
natural language, while grounding them in a context that controls the
hypothesis search space. We present a modeling framework using retrieval of
``inspirations'' from past scientific papers. Our evaluations reveal that GPT-4
tends to generate ideas with overall low technical depth and novelty, while our
inspiration prompting approaches partially mitigate this issue. Our work
represents a first step toward building language models that generate new ideas
derived from scientific literature.
( 2
min )
Evaluating the adversarial robustness of machine learning models using
gradient-based attacks is challenging. In this work, we show that
hyperparameter optimization can improve fast minimum-norm attacks by automating
the selection of the loss function, the optimizer and the step-size scheduler,
along with the corresponding hyperparameters. Our extensive evaluation
involving several robust models demonstrates the improved efficacy of fast
minimum-norm attacks when hyper-up with hyperparameter optimization. We release
our open-source code at https://github.com/pralab/HO-FMN.
( 2
min )
This paper presents a method to efficiently classify the gastroenterologic
section of images derived from Video Capsule Endoscopy (VCE) studies by
exploring the combination of a Convolutional Neural Network (CNN) for
classification with the time-series analysis properties of a Hidden Markov
Model (HMM). It is demonstrated that successive time-series analysis identifies
and corrects errors in the CNN output. Our approach achieves an accuracy of
$98.04\%$ on the Rhode Island (RI) Gastroenterology dataset. This allows for
precise localization within the gastrointestinal (GI) tract while requiring
only approximately 1M parameters and thus, provides a method suitable for low
power devices
( 2
min )
Bayesian Optimization (BO) is typically used to optimize an unknown function
$f$ that is noisy and costly to evaluate, by exploiting an acquisition function
that must be maximized at each optimization step. Even if provably
asymptotically optimal BO algorithms are efficient at optimizing
low-dimensional functions, scaling them to high-dimensional spaces remains an
open problem, often tackled by assuming an additive structure for $f$. By doing
so, BO algorithms typically introduce additional restrictive assumptions on the
additive structure that reduce their applicability domain. This paper contains
two main contributions: (i) we relax the restrictive assumptions on the
additive structure of $f$, at the expense of weakening the maximization
guarantees of the acquisition function, and (ii) we address the
over-exploration problem for decentralized BO algorithms. To these ends, we
propose DumBO, an asymptotically optimal decentralized BO algorithm that
achieves very competitive performance against state-of-the-art BO algorithms,
especially when the additive structure of $f$ comprises high-dimensional
factors.
( 2
min )
Model identification of battery dynamics is a central problem in energy
research; many energy management systems and design processes rely on accurate
battery models for efficiency optimization. The standard methodology for
battery modelling is traditional design of experiments (DoE), where the battery
dynamics are excited with many different current profiles and the measured
outputs are used to estimate the system dynamics. However, although it is
possible to obtain useful models with the traditional approach, the process is
time consuming and expensive because of the need to sweep many different
current-profile configurations. In the present work, a novel DoE approach is
developed based on deep reinforcement learning, which alters the configuration
of the experiments on the fly based on the statistics of past experiments.
Instead of sticking to a library of predefined current profiles, the proposed
approach modifies the current profiles dynamically by updating the output space
covered by past measurements, hence only the current profiles that are
informative for future experiments are applied. Simulations and real
experiments are used to show that the proposed approach gives models that are
as accurate as those obtained with traditional DoE but by using 85\% less
resources.
( 2
min )
We consider the problem of model selection in a high-dimensional sparse
linear regression model under the differential privacy framework. In
particular, we consider the problem of differentially private best subset
selection and study its utility guarantee. We adopt the well-known exponential
mechanism for selecting the best model, and under a certain margin condition,
we establish its strong model recovery property. However, the exponential
search space of the exponential mechanism poses a serious computational
bottleneck. To overcome this challenge, we propose a Metropolis-Hastings
algorithm for the sampling step and establish its polynomial mixing time to its
stationary distribution in the problem parameters $n,p$, and $s$. Furthermore,
we also establish approximate differential privacy for the final estimates of
the Metropolis-Hastings random walk using its mixing property. Finally, we also
perform some illustrative simulations that echo the theoretical findings of our
main results.
( 2
min )
This paper proposes a set of novel optimization algorithms for solving a
class of convex optimization problems with time-varying streaming cost
function. We develop an approach to track the optimal solution with a bounded
error. Unlike the existing results, our algorithm is executed only by using the
first-order derivatives of the cost function which makes it computationally
efficient for optimization with time-varying cost function. We compare our
algorithms to the gradient descent algorithm and show why gradient descent is
not an effective solution for optimization problems with time-varying cost.
Several examples including solving a model predictive control problem cast as a
convex optimization problem with a streaming time-varying cost function
demonstrate our results.
( 2
min )
We investigate the problem of stochastic, combinatorial multi-armed bandits
where the learner only has access to bandit feedback and the reward function
can be non-linear. We provide a general framework for adapting discrete offline
approximation algorithms into sublinear $\alpha$-regret methods that only
require bandit feedback, achieving
$\mathcal{O}\left(T^\frac{2}{3}\log(T)^\frac{1}{3}\right)$ expected cumulative
$\alpha$-regret dependence on the horizon $T$. The framework only requires the
offline algorithms to be robust to small errors in function evaluation. The
adaptation procedure does not even require explicit knowledge of the offline
approximation algorithm -- the offline algorithm can be used as a black box
subroutine. To demonstrate the utility of the proposed framework, the proposed
framework is applied to diverse applications in submodular maximization. The
new CMAB algorithms for submodular maximization with knapsack constraints
outperform a full-bandit method developed for the adversarial setting in
experiments with real-world data.
( 3
min )
Neural network pruning has shown to be an effective technique for reducing
the network size, trading desirable properties like generalization and
robustness to adversarial attacks for higher sparsity. Recent work has claimed
that adversarial pruning methods can produce sparse networks while also
preserving robustness to adversarial examples. In this work, we first
re-evaluate three state-of-the-art adversarial pruning methods, showing that
their robustness was indeed overestimated. We then compare pruned and dense
versions of the same models, discovering that samples on thin ice, i.e., closer
to the unpruned model's decision boundary, are typically misclassified after
pruning. We conclude by discussing how this intuition may lead to designing
more effective adversarial pruning methods in future work.
( 2
min )
Document-level relation extraction (DocRE) aims to extract relations of all
entity pairs in a document. A key challenge in DocRE is the cost of annotating
such data which requires intensive human effort. Thus, we investigate the case
of DocRE in a low-resource setting, and we find that existing models trained on
low data overestimate the NA ("no relation") label, causing limited
performance. In this work, we approach the problem from a calibration
perspective and propose PRiSM, which learns to adapt logits based on relation
semantic information. We evaluate our method on three DocRE datasets and
demonstrate that integrating existing models with PRiSM improves performance by
as much as 26.38 F1 score, while the calibration error drops as much as 36
times when trained with about 3% of data. The code is publicly available at
https://github.com/brightjade/PRiSM.
( 2
min )
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and
adaptively provides high-quality intrinsic rewards to enhance exploration in
reinforcement learning (RL). More specifically, AIRS selects shaping function
from a predefined set based on the estimated task return in real-time,
providing reliable exploration incentives and alleviating the biased objective
problem. Moreover, we develop an intrinsic reward toolkit to provide efficient
and reliable implementations of diverse intrinsic reward approaches. We test
AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite.
Extensive simulation demonstrates that AIRS can outperform the benchmarking
schemes and achieve superior performance with simple architecture.
( 2
min )
In the realm of robotics, numerous downstream robotics tasks leverage machine
learning methods for processing, modeling, or synthesizing data. Often, this
data comprises variables that inherently carry geometric constraints, such as
the unit-norm condition of quaternions representing rigid-body orientations or
the positive definiteness of stiffness and manipulability ellipsoids. Handling
such geometric constraints effectively requires the incorporation of tools from
differential geometry into the formulation of machine learning methods. In this
context, Riemannian manifolds emerge as a powerful mathematical framework to
handle such geometric constraints. Nevertheless, their recent adoption in robot
learning has been largely characterized by a mathematically-flawed
simplification, hereinafter referred to as the ``single tangent space fallacy".
This approach involves merely projecting the data of interest onto a single
tangent (Euclidean) space, over which an off-the-shelf learning algorithm is
applied. This paper provides a theoretical elucidation of various
misconceptions surrounding this approach and offers experimental evidence of
its shortcomings. Finally, it presents valuable insights to promote best
practices when employing Riemannian geometry within robot learning
applications.
( 2
min )
In this study, we present a graph neural network-based learning approach
using an autoencoder setup to derive low-dimensional variables from features
observed in experimental crystal structures. These variables are then biased in
enhanced sampling to observe state-to-state transitions and reliable
thermodynamic weights. Our approach uses simple convolution and pooling
methods. To verify the effectiveness of our protocol, we examined the
nucleation of various allotropes and polymorphs of iron and glycine from their
molten states. Our graph latent variables when biased in well-tempered
metadynamics consistently show transitions between states and achieve accurate
free energy calculations in agreement with experiments, both of which are
indicators of dependable sampling. This underscores the strength and promise of
our graph neural net variables for improved sampling. The protocol shown here
should be applicable for other systems and with other sampling methods.
( 2
min )
For over two decades, detecting rare events has been a challenging task among
researchers in the data mining and machine learning domain. Real-life problems
inspire researchers to navigate and further improve data processing and
algorithmic approaches to achieve effective and computationally efficient
methods for imbalanced learning. In this paper, we have collected and reviewed
258 peer-reviewed papers from archival journals and conference papers in an
attempt to provide an in-depth review of various approaches in imbalanced
learning from technical and application perspectives. This work aims to provide
a structured review of methods used to address the problem of imbalanced data
in various domains and create a general guideline for researchers in academia
or industry who want to dive into the broad field of machine learning using
large-scale imbalanced data.
( 2
min )
This paper discusses predictive performance and processes undertaken on
flight pricing data utilizing r2(r-square) and RMSE that leverages a large
dataset, originally from Expedia.com, consisting of approximately 20 million
records or 4.68 gigabytes. The project aims to determine the best models usable
in the real world to predict airline ticket fares for non-stop flights across
the US. Therefore, good generalization capability and optimized processing
times are important measures for the model.
We will discover key business insights utilizing feature importance and
discuss the process and tools used for our analysis. Four regression machine
learning algorithms were utilized: Random Forest, Gradient Boost Tree, Decision
Tree, and Factorization Machines utilizing Cross Validator and Training
Validator functions for assessing performance and generalization capability.
( 2
min )
Matrix-variate distributions are a recent addition to the model-based
clustering field, thereby making it possible to analyze data in matrix form
with complex structure such as images and time series. Due to its recent
appearance, there is limited literature on matrix-variate data, with even less
on dealing with outliers in these models. An approach for clustering
matrix-variate normal data with outliers is discussed. The approach, which uses
the distribution of subset log-likelihoods, extends the OCLUST algorithm to
matrix-variate normal data and uses an iterative approach to detect and trim
outliers.
( 2
min )
This study proposes an interpretable neural network-based non-proportional
odds model (N$^3$POM) for ordinal regression. N$^3$POM is different from
conventional approaches to ordinal regression with non-proportional models in
several ways: (1) N$^3$POM is designed to directly handle continuous responses,
whereas standard methods typically treat de facto ordered continuous variables
as discrete, (2) instead of estimating response-dependent finite coefficients
of linear models from discrete responses as is done in conventional approaches,
we train a non-linear neural network to serve as a coefficient function. Thanks
to the neural network, N$^3$POM offers flexibility while preserving the
interpretability of conventional ordinal regression. We establish a sufficient
condition under which the predicted conditional cumulative probability locally
satisfies the monotonicity constraint over a user-specified region in the
covariate space. Additionally, we provide a monotonicity-preserving stochastic
(MPS) algorithm for effectively training the neural network. We apply N$^3$POM
to several real-world datasets.
( 2
min )
Neural ordinary differential equations (neural ODEs) are a popular family of
continuous-depth deep learning models. In this work, we consider a large family
of parameterized ODEs with continuous-in-time parameters, which include
time-dependent neural ODEs. We derive a generalization bound for this class by
a Lipschitz-based argument. By leveraging the analogy between neural ODEs and
deep residual networks, our approach yields in particular a generalization
bound for a class of deep residual networks. The bound involves the magnitude
of the difference between successive weight matrices. We illustrate numerically
how this quantity affects the generalization capability of neural networks.
( 2
min )
Hierarchical time series are common in several applied fields. The forecasts
for these time series are required to be coherent, that is, to satisfy the
constraints given by the hierarchy. The most popular technique to enforce
coherence is called reconciliation, which adjusts the base forecasts computed
for each time series. However, recent works on probabilistic reconciliation
present several limitations. In this paper, we propose a new approach based on
conditioning to reconcile any type of forecast distribution. We then introduce
a new algorithm, called Bottom-Up Importance Sampling, to efficiently sample
from the reconciled distribution. It can be used for any base forecast
distribution: discrete, continuous, or in the form of samples, providing a
major speedup compared to the current methods. Experiments on several temporal
hierarchies show a significant improvement over base probabilistic forecasts.
( 2
min )
Maximum-type statistics of certain functions of the sample covariance matrix
of high-dimensional vector time series are studied to statistically confirm or
reject the null hypothesis that a data set has been collected under normal
conditions. The approach generalizes the case of the maximal deviation of the
sample autocovariances function from its assumed values. Within a linear time
series framework it is shown that Gumbel-type extreme value asymptotics holds
true. As applications we discuss long-only mimimal-variance portfolio
optimization and subportfolio analysis with respect to idiosyncratic risks, ETF
index tracking by sparse tracking portfolios, convolutional deep learners for
image analysis and the analysis of array-of-sensors data.
( 2
min )
We consider the problem of model selection in a high-dimensional sparse
linear regression model under the differential privacy framework. In
particular, we consider the problem of differentially private best subset
selection and study its utility guarantee. We adopt the well-known exponential
mechanism for selecting the best model, and under a certain margin condition,
we establish its strong model recovery property. However, the exponential
search space of the exponential mechanism poses a serious computational
bottleneck. To overcome this challenge, we propose a Metropolis-Hastings
algorithm for the sampling step and establish its polynomial mixing time to its
stationary distribution in the problem parameters $n,p$, and $s$. Furthermore,
we also establish approximate differential privacy for the final estimates of
the Metropolis-Hastings random walk using its mixing property. Finally, we also
perform some illustrative simulations that echo the theoretical findings of our
main results.
( 2
min )
Transformers pretrained on diverse tasks exhibit remarkable in-context
learning (ICL) capabilities, enabling them to solve unseen tasks solely based
on input contexts without adjusting model parameters. In this paper, we study
ICL in one of its simplest setups: pretraining a linearly parameterized
single-layer linear attention model for linear regression with a Gaussian
prior. We establish a statistical task complexity bound for the attention model
pretraining, showing that effective pretraining only requires a small number of
independent tasks. Furthermore, we prove that the pretrained model closely
matches the Bayes optimal algorithm, i.e., optimally tuned ridge regression, by
achieving nearly Bayes optimal risk on unseen tasks under a fixed context
length. These theoretical findings complement prior experimental research and
shed light on the statistical foundations of ICL.
( 2
min )
Obtaining continuously updated predictions is a major challenge for
personalised medicine. Leveraging combinations of parametric regressions and
machine learning approaches, the personalised online super learner (POSL) can
achieve such dynamic and personalised predictions. We adapt POSL to predict a
repeated continuous outcome dynamically and propose a new way to validate such
personalised or dynamic prediction models. We illustrate its performance by
predicting the convection volume of patients undergoing hemodiafiltration. POSL
outperformed its candidate learners with respect to median absolute error,
calibration-in-the-large, discrimination, and net benefit. We finally discuss
the choices and challenges underlying the use of POSL.
( 2
min )
Companies, more often, pay attention to automation and innovation over proficiency and productivity. However, firms can maintain a balance between both due to the extensive usage of AI and data science programs. Here are the stats that show the impact of AI and data science in diverse sectors: Applications of AI and data science have… Read More »Future of AI and data science – How to secure a bright career
The post Future of AI and data science – How to secure a bright career appeared first on Data Science Central.
( 21
min )
At SHoP Architects, a New York City-based architectural firm, Mengyi Fan and her team aim to inspire industry professionals to create visual masterpieces by incorporating emerging technologies. Fan, the director of visualization at SHoP, has expertise that spans the fields of architectural visualization and design. She takes a definitive, novel and enduring approach to designing Read article >
( 6
min )
Posted by Nicholas Rubin, Senior Research Scientist, and Ryan Babbush, Head of Quantum Algorithms, Quantum AI Team
If you’ve paid attention to the quantum computing space, you’ve heard the claim that in the future, quantum computers will solve certain problems exponentially more efficiently than classical computers can. They have the potential to transform many industries, from pharmaceuticals to energy.
For the most part, these claims have rested on arguments about the asymptotic scaling of algorithms as the problem size approaches infinity, but this tells us very little about the practical performance of quantum computers for finite-sized problems. We want to be more concrete: Exactly which problems are quantum computers more suited to tackle than their classical counterparts, an…
( 94
min )
At one of the U.K.’s largest technology festivals, top enterprises and startups are this week highlighting their latest innovations, hosting workshops and celebrating the growing tech ecosystem based in the country’s southwest. The Bristol Technology Festival today showcased the work of nine startups that recently participated in a challenge hosted by Digital Catapult — the Read article >
( 6
min )
Put the pedal to the metal this GFN Thursday as Forza Motorsport leads 23 new games in the cloud. Plus, Acer’s Predator Connect 6E is the newest addition to the GeForce NOW Recommended program, with easy cloud gaming quality-of-service (QoS) settings built in to give Ultimate members the best streaming experience. No Breaks, No Limits, Read article >
( 6
min )
These research papers were presented at the IEEE Symposium on Visual Languages and Human-Centric Computing (opens in new tab) (VL/HCC 2023), a premier forum for design, theory, and application of computing technologies for programming, modelling, and communication. Large language models (LLMs) have revolutionized the way novice programmers and everyday computer users tap into the capabilities […]
The post Microsoft at VL/HCC 2023: Focus on co-audit tools for spreadsheets appeared first on Microsoft Research.
( 10
min )
What is the optimal framework and configuration for hosting large language models (LLMs) for text-generating generative AI applications? Despite the abundance of options for serving LLMs, this is a hard question to answer due to the size of the models, varying model architectures, performance requirements of applications, and more. The Amazon SageMaker Large Model Inference […]
( 13
min )
In this post, we show how to index information stored in websites and use the intelligent search in Amazon Kendra to search for answers from content stored in internal and external websites. In addition, the ML-powered intelligent search can accurately get answers for your questions from unstructured documents with natural language narrative content, for which keyword search is not very effective.
( 7
min )
Research Focus: Principal researcher Lester Mackey recognized for pioneering statistical and ML techniques; Pareto frontiers in neural feature learning; structural inequality in the influencer industry; new research on cardinality estimation.
The post Research Focus: Week of October 9, 2023 appeared first on Microsoft Research.
( 9
min )
Developers have a new AI-powered steering wheel to help them hug the road while they drive powerful large language models (LLMs) to their desired locations. NVIDIA NeMo SteerLM lets companies define knobs to dial in a model’s responses as it’s running in production, a process called inference. Unlike current methods for customizing an LLM, it Read article >
( 6
min )
Gartner predicts blockchain’s economic impact to reach $176 billion by 2025 and $3.1 trillion by 2030. The AI software market is expected to reach $134.8 billion by 2025. Blockchain and AI benefit businesses. AI models process data, extract insights, and make decisions. Blockchain ensures data integrity and trust among participants. Read on to discover the… Read More »How does combining blockchain and AI create new business opportunities?
The post How does combining blockchain and AI create new business opportunities? appeared first on Data Science Central.
( 22
min )
In the contemporary digital landscape, data has emerged as a critical asset for organizations aiming to make informed decisions and foster innovation. Data analytics can unlock a treasure trove of insights, driving competitive advantage and operational excellence by leveraging the vast amounts of data generated every second. As a consequence, the demand for skilled professionals… Read More »Understanding the difference: Data analyst, data scientist, and data engineer
The post Understanding the difference: Data analyst, data scientist, and data engineer appeared first on Data Science Central.
( 24
min )
I’ve been in this industry for over 40 years (yes, I just started in the data and analytics industry when I was 11), and I have NEVER seen anything like Artificial Intelligence (AI) and Generative AI (GenAI) capture the attention of CEOs (and the dystopic fear of everyone else). Is AI a game-changer? Definitely! Will… Read More »11 Questions Every CEO Should Ask about AI / Generative AI
The post 11 Questions Every CEO Should Ask about AI / Generative AI appeared first on Data Science Central.
( 23
min )
Launched in 2021, Amazon SageMaker Canvas is a visual, point-and-click service that allows business analysts and citizen data scientists to use ready-to-use machine learning (ML) models and build custom ML models to generate accurate predictions without the need to write any code. Ready-to-use models enable you to derive immediate insights from text, image, and document […]
( 7
min )
Today, we’re excited to announce that the OpenAI Whisper foundation model is available for customers using Amazon SageMaker JumpStart. Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680 thousand hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need […]
( 11
min )
In this blog, you will learn to build a cloud-native FL architecture on AWS. By using infrastructure as code (IaC) tools on AWS, you can deploy FL architectures with ease. Also, a cloud-native architecture takes full advantage of a variety of AWS services with proven security and operational excellence, thereby simplifying the development of FL.
( 12
min )
Generative AI is helping creatives across many industries bring ideas to life at unprecedented speed. This technology will be on display at Adobe MAX, running through Thursday, Oct. 12, in person and virtually.
( 9
min )
Today, we are excited to announce that the Mistral 7B foundation models, developed by Mistral AI, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. With 7 billion parameters, Mistral 7B can be easily customized and quickly deployed. You can try out this model with SageMaker JumpStart, a […]
( 14
min )
According to Gartner, 85% of software buyers trust online reviews as much as personal recommendations. Customers provide feedback and reviews about products they have purchased through many channels, including review websites, vendor websites, sales calls, social media, and many others. The problem with the increasing volume of customer reviews across multiple channels is that it […]
( 7
min )
A recommendation engine is only as good as the data used to prepare it. Transforming raw data into a format that is suitable for a model is key to getting better personalized recommendations for end-users. In this post, we walk through how to prepare and import the MovieLens dataset, a dataset prepared by GroupLens research […]
( 11
min )
Posted by Sagar M. Waghmare, Senior Software Engineer, and Kimberly Wilber, Software Engineer, Google Research, Perception Team
As most people navigate their everyday world, they process visual input from the environment using an eye-level perspective. Unlike robots and self-driving cars, people don't have any "out-of-body" sensors to help guide them. Instead, a person’s sensory input is completely "egocentric", or "from the self." This also applies to new technologies that understand the world around us from a human-like perspective, e.g., robots navigating through unknown buildings, AR glasses that highlight objects, or assistive technology to help people run independently.
In computer vision, scene understanding is the subfield that studies how visible objects relate to the sce…
( 93
min )
This cutting-edge area of AI focuses on building models that can create original material, including music, images, text, and even entire virtual worlds.
The post Revolutionizing business: A look at generative AI’s real-world impact appeared first on Data Science Central.
( 20
min )
We use the maximum a posteriori estimation principle for learning
representations distributed on the unit sphere. We propose to use the angular
Gaussian distribution, which corresponds to a Gaussian projected on the
unit-sphere and derive the associated loss function. We also consider the von
Mises-Fisher distribution, which is the conditional of a Gaussian in the
unit-sphere. The learned representations are pushed toward fixed directions,
which are the prior means of the Gaussians; allowing for a learning strategy
that is resilient to data drift. This makes it suitable for online continual
learning, which is the problem of training neural networks on a continuous data
stream, where multiple classification tasks are presented sequentially so that
data from past tasks are no longer accessible, and data from the current task
can be seen only once. To address this challenging scenario, we propose a
memory-based representation learning technique equipped with our new loss
functions. Our approach does not require negative data or knowledge of task
boundaries and performs well with smaller batch sizes while being
computationally efficient. We demonstrate with extensive experiments that the
proposed method outperforms the current state-of-the-art methods on both
standard evaluation scenarios and realistic scenarios with blurry task
boundaries. For reproducibility, we use the same training pipeline for every
compared method and share the code at https://t.ly/SQTj.
( 3
min )
Markov Decision Processes (MDPs) are a formal framework for modeling and
solving sequential decision-making problems. In finite-time horizons such
problems are relevant for instance for optimal stopping or specific supply
chain problems, but also in the training of large language models. In contrast
to infinite horizon MDPs optimal policies are not stationary, policies must be
learned for every single epoch. In practice all parameters are often trained
simultaneously, ignoring the inherent structure suggested by dynamic
programming. This paper introduces a combination of dynamic programming and
policy gradient called dynamic policy gradient, where the parameters are
trained backwards in time. For the tabular softmax parametrisation we carry out
the convergence analysis for simultaneous and dynamic policy gradient towards
global optima, both in the exact and sampled gradient settings without
regularisation. It turns out that the use of dynamic policy gradient training
much better exploits the structure of finite-time problems which is reflected
in improved convergence bounds.
( 2
min )
Detecting and discovering new gene interactions based on known gene
expressions and gene interaction data presents a significant challenge. Various
statistical and deep learning methods have attempted to tackle this challenge
by leveraging the topological structure of gene interactions and gene
expression patterns to predict novel gene interactions. In contrast, some
approaches have focused exclusively on utilizing gene expression profiles. In
this context, we introduce GENER, a parallel-layer deep learning network
designed exclusively for the identification of gene-gene relationships using
gene expression data. We conducted two training experiments and compared the
performance of our network with that of existing statistical and deep learning
approaches. Notably, our model achieved an average AUROC score of 0.834 on the
combined BioGRID&DREAM5 dataset, outperforming competing methods in predicting
gene-gene interactions.
( 2
min )
We propose a new gradient descent algorithm with added stochastic terms for
finding the global optimizers of nonconvex optimization problems. A key
component in the algorithm is the adaptive tuning of the randomness based on
the value of the objective function. In the language of simulated annealing,
the temperature is state-dependent. With this, we prove the global convergence
of the algorithm with an algebraic rate both in probability and in the
parameter space. This is a significant improvement over the classical rate from
using a more straightforward control of the noise term. The convergence proof
is based on the actual discrete setup of the algorithm, not just its continuous
limit as often done in the literature. We also present several numerical
examples to demonstrate the efficiency and robustness of the algorithm for
reasonably complex objective functions.
( 2
min )
An extension of Transformers is proposed that enables explicit relational
reasoning through a novel module called the Abstractor. At the core of the
Abstractor is a variant of attention called relational cross-attention. The
approach is motivated by an architectural inductive bias for relational
learning that disentangles relational information from extraneous features
about individual objects. This enables explicit relational reasoning,
supporting abstraction and generalization from limited data. The Abstractor is
first evaluated on simple discriminative relational tasks and compared to
existing relational architectures. Next, the Abstractor is evaluated on purely
relational sequence-to-sequence tasks, where dramatic improvements are seen in
sample efficiency compared to standard Transformers. Finally, Abstractors are
evaluated on a collection of tasks based on mathematical problem solving, where
modest but consistent improvements in performance and sample efficiency are
observed.
( 2
min )
In this paper, we introduce a novel class of graphical models for
representing time lag specific causal relationships and independencies of
multivariate time series with unobserved confounders. We completely
characterize these graphs and show that they constitute proper subsets of the
currently employed model classes. As we show, from the novel graphs one can
thus draw stronger causal inferences -- without additional assumptions. We
further introduce a graphical representation of Markov equivalence classes of
the novel graphs. This graphical representation contains more causal knowledge
than what current state-of-the-art causal discovery algorithms learn.
( 2
min )
We propose a graphical structure for structural equation models that is
stable under marginalization under linearity and Gaussianity assumptions. We
show that computing the maximum likelihood estimation of this model is
equivalent to training a neural network. We implement a GPU-based algorithm
that computes the maximum likelihood estimation of these models.
( 2
min )
Neural networks have shown remarkable performance in computer vision, but
their deployment in numerous scientific and technical fields is challenging due
to their black-box nature. Scientists and practitioners need to evaluate the
reliability of a decision, i.e., to know simultaneously if a model relies on
the relevant features and whether these features are robust to image
corruptions. Existing attribution methods aim to provide human-understandable
explanations by highlighting important regions in the image domain, but fail to
fully characterize a decision process's reliability. To bridge this gap, we
introduce the Wavelet sCale Attribution Method (WCAM), a generalization of
attribution from the pixel domain to the space-scale domain using wavelet
transforms. Attribution in the wavelet domain reveals where {\it and} on what
scales the model focuses, thus enabling us to assess whether a decision is
reliable.
( 3
min )
In this paper, we introduce a novel class of graphical models for
representing time lag specific causal relationships and independencies of
multivariate time series with unobserved confounders. We completely
characterize these graphs and show that they constitute proper subsets of the
currently employed model classes. As we show, from the novel graphs one can
thus draw stronger causal inferences -- without additional assumptions. We
further introduce a graphical representation of Markov equivalence classes of
the novel graphs. This graphical representation contains more causal knowledge
than what current state-of-the-art causal discovery algorithms learn.
( 2
min )
We identify hidden layers inside a DNN with group actions on the data space,
and formulate the DNN as a dual voice transform with respect to Koopman
operator, a linear representation of the group action. Based on the group
theoretic arguments, particularly by using Schur's lemma, we show a simple
proof of the universality of those DNNs.
( 2
min )
We study the problem of training a flow-based generative model, parametrized
by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture.
We provide a sharp end-to-end analysis of the problem. First, we provide a
tight closed-form characterization of the learnt velocity field, when
parametrized by a shallow denoising auto-encoder trained on a finite number $n$
of samples from the target distribution. Building on this analysis, we provide
a sharp description of the corresponding generative flow, which pushes the base
Gaussian density forward to an approximation of the target density. In
particular, we provide closed-form formulae for the distance between the mean
of the generated mixture and the mean of the target mixture, which we show
decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact
Bayes-optimal.
( 2
min )
When artificial neural networks have demonstrated exceptional practical
success in a variety of domains, investigations into their theoretical
characteristics, such as their approximation power, statistical properties, and
generalization performance, have concurrently made significant strides. In this
paper, we construct a novel theory for understanding the effectiveness of
neural networks, which offers a perspective distinct from prior research.
Specifically, we explore the rationale underlying a common practice during the
construction of neural network models: sample splitting. Our findings indicate
that the optimal hyperparameters derived from sample splitting can enable a
neural network model that asymptotically minimizes the prediction risk. We
conduct extensive experiments across different application scenarios and
network architectures, and the results manifest our theory's effectiveness.
( 2
min )
Markov Decision Processes (MDPs) are a formal framework for modeling and
solving sequential decision-making problems. In finite-time horizons such
problems are relevant for instance for optimal stopping or specific supply
chain problems, but also in the training of large language models. In contrast
to infinite horizon MDPs optimal policies are not stationary, policies must be
learned for every single epoch. In practice all parameters are often trained
simultaneously, ignoring the inherent structure suggested by dynamic
programming. This paper introduces a combination of dynamic programming and
policy gradient called dynamic policy gradient, where the parameters are
trained backwards in time. For the tabular softmax parametrisation we carry out
the convergence analysis for simultaneous and dynamic policy gradient towards
global optima, both in the exact and sampled gradient settings without
regularisation. It turns out that the use of dynamic policy gradient training
much better exploits the structure of finite-time problems which is reflected
in improved convergence bounds.
( 2
min )
We propose conditional flows of the maximum mean discrepancy (MMD) with the
negative distance kernel for posterior sampling and conditional generative
modeling. This MMD, which is also known as energy distance, has several
advantageous properties like efficient computation via slicing and sorting. We
approximate the joint distribution of the ground truth and the observations
using discrete Wasserstein gradient flows and establish an error bound for the
posterior distributions. Further, we prove that our particle flow is indeed a
Wasserstein gradient flow of an appropriate functional. The power of our method
is demonstrated by numerical examples including conditional image generation
and inverse problems like superresolution, inpainting and computed tomography
in low-dose and limited-angle settings.
( 2
min )
In this post, we elucidate the simple yet powerful idea of combining user profiles and item attributes to generate personalized content recommendations using LLMs. As demonstrated throughout the post, these models hold immense potential in generating high-quality, context-aware input text, which leads to enhanced recommendations. To illustrate this, we guide you through the process of integrating a feature store (representing user profiles) with an LLM to generate these personalized recommendations.
( 13
min )
In this post, we provide an overview of popular multimodality models. We also demonstrate how to deploy these pre-trained models on Amazon SageMaker. Furthermore, we discuss the diverse applications of these models, focusing particularly on several real-world scenarios, such as zero-shot tag and attribution generation for ecommerce and automatic prompt generation from images.
( 13
min )
A research team is aiming to shake up the status quo for earthquake models. Researchers from the Universities of California at Berkeley and Santa Cruz, and the Technical University of Munich recently released a paper describing a new model that delivers deep learning to earthquake forecasting. Dubbed RECAST, the model can use larger datasets and Read article >
( 6
min )
A persistent challenge in deep learning is optimizing neural network models for diverse hardware configurations, balancing performance and low latency. Learn how SpaceEvo automates hardware-aware neural architecture search to fine-tune DNN models for swift execution on diverse devices.
The post Efficient and hardware-friendly neural architecture search with SpaceEvo appeared first on Microsoft Research.
( 10
min )
In this post, we explain how to build and optimize a custom classification model using Amazon Comprehend. We demonstrate this using an Amazon Comprehend custom classification to build a multi-label custom classification model, and provide guidelines on how to prepare the training dataset and tune the model to meet performance metrics such as accuracy, precision, recall, and F1 score.
( 8
min )
Large language models (LLMs) have captured the imagination and attention of developers, scientists, technologists, entrepreneurs, and executives across several industries. These models can be used for question answering, summarization, translation, and more in applications such as conversational agents for customer support, content creation for marketing, and coding assistants. Recently, Meta released Llama 2 for both […]
( 7
min )
Amid the race to make AI bigger and better, Lincoln Laboratory is developing ways to reduce power, train efficiently, and make energy use transparent.
( 11
min )
HoloAssist is a new multimodal dataset consisting of 166 hours of interactive task executions with 222 participants. Discover how it offers invaluable data to advance the capabilities of next-gen AI copilots for real-world tasks.
The post HoloAssist: A multimodal dataset for next-gen AI copilots for the physical world appeared first on Microsoft Research.
( 10
min )
Connecting with researchers, collaborating across disciplines, and exploring a new city—PhD students Jennifer Scurrell and Alejandro Cuevas talk to Senior Researcher Madeleine Daepp about the internship experience at Microsoft Research.
The post Intern Insights: Dr. Madeleine Daepp with Jennifer Scurrell and Alejandro Cuevas appeared first on Microsoft Research.
( 29
min )
Just as athletes train for a game or actors rehearse for a performance, surgeons prepare ahead of an operation. Now, Atlas Meditech is letting brain surgeons experience a new level of realism in their pre-surgery preparation with AI and physically accurate simulations. Atlas Meditech, a brain-surgery intelligence platform, is adopting tools — including the MONAI Read article >
( 7
min )
October brings more than falling leaves and pumpkin spice lattes for GeForce NOW members. Get ready for nearly 60 new games to stream, including Forza Motorsport and 16 more PC Game Pass titles. Assassin’s Creed Mirage leads 29 new games to hit the GeForce NOW library this week. In addition, catch a challenge to earn Read article >
( 9
min )
For NVIDIA Senior AI Scientist Jim Fan, the video game Minecraft served as the “perfect primordial soup” for his research on open-ended AI agents. In the latest AI Podcast episode, host Noah Kravitz spoke with Fan on using large language models to create AI agents — specifically to create Voyager, an AI bot built with Read article >
( 6
min )
This September, I had the chance to attend the Heidelberg Laureate Forum (HLF) for the second — and probably last — time. The HLF is an incredible experince for young researchers: Mirroring the Lindau Nobel Laureate Meetings, the organizers invite laureates from math and computer science together with young researchers pursuing their undergraduate, graduate or post-doc studies. In this article, I want to share impressions and encourage students to apply next year!
The post My Impressions (and Application) of the Heidelberg Laureate Forum 2023 appeared first on David Stutz.
( 7
min )
Analyzing medical images plays a crucial role in diagnosing and treating diseases. The ability to automate this process using machine learning (ML) techniques allows healthcare professionals to more quickly diagnose certain cancers, coronary diseases, and ophthalmologic conditions. However, one of the key challenges faced by clinicians and researchers in this field is the time-consuming and […]
( 11
min )
Healthcare and life sciences (HCLS) customers are adopting generative AI as a tool to get more from their data. Use cases include document summarization to help readers focus on key points of a document and transforming unstructured text into standardized formats to highlight important attributes. With unique data formats and strict regulatory requirements, customers are […]
( 9
min )
Prior authorization is a crucial process in healthcare that involves the approval of medical treatments or procedures before they are carried out. This process is necessary to ensure that patients receive the right care and that healthcare providers are following the correct procedures. However, prior authorization can be a time-consuming and complex process that requires […]
( 7
min )
One of the most impressive generative AI applications I have seen is viperGPT. The image / site explains it best. The steps are: This example, earlier this year, showed the potential of multimodal LLMs And as of last week, that future is upon us ChatGPT can now see, hear & speak. What are the implications… Read More »Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think?
The post Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think? appeared first on Data Science Central.
( 20
min )
In the ever-evolving landscape of the digital era, the relentless quest for deriving actionable insights from a sea of information has become the cornerstone of innovation and strategy. As businesses and organizations strive to navigate the complex corridors of big data, the spotlight invariably falls upon the expertise of data scientists, the modern-day architects of… Read More »Cracking the code: The rising demand for data scientists in various industries
The post Cracking the code: The rising demand for data scientists in various industries appeared first on Data Science Central.
( 21
min )
I recently subscribed to openAI GPT4 for the OpenAI Code Interpreter/Advanced data analytics. We are using it in our class at the University of Oxford. Its really cool and we are also waiting the multimodal openAI features Recently, a well known AI critic said that he does not see how Generative AI companies could be… Read More »Generative AI megatrends: How many LLMs would you subscribe to?
The post Generative AI megatrends: How many LLMs would you subscribe to? appeared first on Data Science Central.
( 19
min )
Designed to ensure safer skies, “Air-Guardian” blends human intuition with machine precision, creating a more symbiotic relationship between pilot and aircraft.
( 8
min )
A diverse research ecosystem is essential to realizing the promise of AI. Accelerate Foundation Models Research aims to expand access to powerful models, engaging academics outside of computer science to pursue a broad range of important opportunities.
The post Accelerate Foundation Models Research: Supporting a global academic research ecosystem for AI appeared first on Microsoft Research.
( 10
min )
With the help of AI, robots, tractors and baby strollers — even skate parks — are becoming autonomous. One developer, Kabilan KB, is bringing autonomous-navigation capabilities to wheelchairs, which could help improve mobility for people with disabilities. The undergraduate from the Karunya Institute of Technology and Sciences in Coimbatore, India, is powering his autonomous wheelchair Read article >
( 6
min )
Releasing a 3D tutorial dubbed The Easiest VFX Tutorial Ever takes supreme confidence and the skills to back it up. Steve Lund a.k.a. CG Geek — the featured artist of this week’s In the NVIDIA Studio installment — has both in spades.
( 8
min )
No content preview
( 1
min )
Today, we are excited to announce Code Llama foundation models, developed by Meta, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. Code Llama is a state-of-the-art large language model (LLM) capable of generating code and natural language about code from both code and natural language prompts. Code […]
( 11
min )
A successful deployment of a machine learning (ML) model in a production environment heavily relies on an end-to-end ML pipeline. Although developing such a pipeline can be challenging, it becomes even more complex when dealing with an edge ML use case. Machine learning at the edge is a concept that brings the capability of running […]
( 10
min )
In Part 1 of this series, we drafted an architecture for an end-to-end MLOps pipeline for a visual quality inspection use case at the edge. It is architected to automate the entire machine learning (ML) process, from data labeling to model training and deployment at the edge. The focus on managed and serverless services reduces […]
( 9
min )
This is Part 3 of our series where we design and implement an MLOps pipeline for visual quality inspection at the edge. In this post, we focus on how to automate the edge deployment part of the end-to-end MLOps pipeline. We show you how to use AWS IoT Greengrass to manage model inference at the […]
( 9
min )
By focusing on causal relationships in genome regulation, a new AI method could help scientists identify new immunotherapy techniques or regenerative therapies.
( 10
min )
AI Weirdness: the strange side of machine learning
( 2
min )
We introduce RACH-Space, a novel classification method in ensemble learning.
In particular, we show its applicability as a label model for weakly supervised
learning. RACH-Space offers simplicity in implementation with minimal
assumptions on the data or weak signals. The model is well suited for scenarios
where fully labeled data is not available. Our method is built upon geometrical
interpretation of the space spanned by weak signals. Our analysis of the high
dimensional convex hull structure underlying general set of weak signals
bridges geometry with machine learning. Empirical results also demonstrate that
RACH-Space works well in practice and compares favorably to best existing label
models for weakly supervised learning.
( 2
min )
In this paper, we introduce a novel analysis of neural networks based on
geometric (Clifford) algebra and convex optimization. We show that optimal
weights of deep ReLU neural networks are given by the wedge product of training
samples when trained with standard regularized loss. Furthermore, the training
problem reduces to convex optimization over wedge product features, which
encode the geometric structure of the training dataset. This structure is given
in terms of signed volumes of triangles and parallelotopes generated by data
vectors. The convex problem finds a small subset of samples via $\ell_1$
regularization to discover only relevant wedge product features. Our analysis
provides a novel perspective on the inner workings of deep neural networks and
sheds light on the role of the hidden layers.
( 2
min )
Property prediction plays an important role in material discovery. As an
initial step to eventually develop a foundation model for material science, we
introduce a new autoencoder called the MHG-GNN, which combines graph neural
network (GNN) with Molecular Hypergraph Grammar (MHG). Results on a variety of
property prediction tasks with diverse materials show that MHG-GNN is
promising.
( 2
min )
Consistency regularization-based methods are prevalent in semi-supervised
learning (SSL) algorithms due to their exceptional performance. However, they
mainly depend on domain-specific data augmentations, which are not usable in
domains where data augmentations are less practicable. On the other hand,
Pseudo-labeling (PL) is a general and domain-agnostic SSL approach that, unlike
consistency regularization-based methods, does not rely on the domain. PL
underperforms due to the erroneous high-confidence predictions from poorly
calibrated models. This paper proposes an uncertainty-aware pseudo-label
selection framework that employs uncertainty sets yielded by the conformal
regularization algorithm to fix the poor calibration neural networks, reducing
noisy training data. The codes of this work are available at:
https://github.com/matinmoezzi/ups conformal classification
( 2
min )
Adversarial Machine Learning (AML) is a rapidly growing field of security
research, with an often overlooked area being model attacks through
side-channels. Previous works show such attacks to be serious threats, though
little progress has been made on efficient remediation strategies that avoid
costly model re-engineering. This work demonstrates a new defense against AML
side-channel attacks using model compilation techniques, namely tensor
optimization. We show relative model attack effectiveness decreases of up to
43% using tensor optimization, discuss the implications, and direction of
future work.
( 2
min )
In this paper, we introduce two types of novel Asymptotic-Preserving
Convolutional Deep Operator Networks (APCONs) designed to address the
multiscale time-dependent linear transport problem. We observe that the vanilla
physics-informed DeepONets with modified MLP may exhibit instability in
maintaining the desired limiting macroscopic behavior. Therefore, this
necessitates the utilization of an asymptotic-preserving loss function. Drawing
inspiration from the heat kernel in the diffusion equation, we propose a new
architecture called Convolutional Deep Operator Networks, which employ multiple
local convolution operations instead of a global heat kernel, along with
pooling and activation operations in each filter layer. Our APCON methods
possess a parameter count that is independent of the grid size and are capable
of capturing the diffusive behavior of the linear transport problem. Finally,
we validate the effectiveness of our methods through several numerical
examples.
( 2
min )
This paper studies the problem of learning the large-scale Gaussian graphical
models that are multivariate totally positive of order two ($\text{MTP}_2$). By
introducing the concept of bridge, which commonly exists in large-scale sparse
graphs, we show that the entire problem can be equivalently optimized through
(1) several smaller-scaled sub-problems induced by a \emph{bridge-block
decomposition} on the thresholded sample covariance graph and (2) a set of
explicit solutions on entries corresponding to \emph{bridges}. From practical
aspect, this simple and provable discipline can be applied to break down a
large problem into small tractable ones, leading to enormous reduction on the
computational complexity and substantial improvements for all existing
algorithms. The synthetic and real-world experiments demonstrate that our
proposed method presents a significant speed-up compared to the
state-of-the-art benchmarks.
( 2
min )
Inferring biological relationships from cellular phenotypes in high-content
microscopy screens provides significant opportunity and challenge in biological
research. Prior results have shown that deep vision models can capture
biological signal better than hand-crafted features. This work explores how
weakly supervised and self-supervised deep learning approaches scale when
training larger models on larger datasets. Our results show that both CNN- and
ViT-based masked autoencoders significantly outperform weakly supervised
models. At the high-end of our scale, a ViT-L/8 trained on over 3.5-billion
unique crops sampled from 95-million microscopy images achieves relative
improvements as high as 28% over our best weakly supervised models at inferring
known biological relationships curated from public databases.
( 2
min )
We prove a fundamental limitation on the efficiency of a wide class of
Reinforcement Learning (RL) algorithms. This limitation applies to model-free
RL methods as well as a broad range of model-based methods, such as planning
with tree search.
Under an abstract definition of this class, we provide a family of RL
problems for which these methods suffer a lower bound exponential in the
horizon for their interactions with the environment to find an optimal
behavior. However, there exists a method, not tailored to this specific family
of problems, which can efficiently solve the problems in the family.
In contrast, our limitation does not apply to several types of methods
proposed in the literature, for instance, goal-conditioned methods or other
algorithms that construct an inverse dynamics model.
( 2
min )
This work reports the empirical performance of an automated medical landmark
detection method for predict clinical markers in hip radiograph images.
Notably, the detection method was trained using a label-only augmentation
scheme; our results indicate that this form of augmentation outperforms
traditional data augmentation and produces highly sample efficient estimators.
We train a generic U-Net-based architecture under a curriculum consisting of
two phases: initially relaxing the landmarking task by enlarging the label
points to regions, then gradually eroding these label regions back to the base
task. We measure the benefits of this approach on six datasets of radiographs
with gold-standard expert annotations.
( 2
min )
In this paper, we introduce a novel analysis of neural networks based on
geometric (Clifford) algebra and convex optimization. We show that optimal
weights of deep ReLU neural networks are given by the wedge product of training
samples when trained with standard regularized loss. Furthermore, the training
problem reduces to convex optimization over wedge product features, which
encode the geometric structure of the training dataset. This structure is given
in terms of signed volumes of triangles and parallelotopes generated by data
vectors. The convex problem finds a small subset of samples via $\ell_1$
regularization to discover only relevant wedge product features. Our analysis
provides a novel perspective on the inner workings of deep neural networks and
sheds light on the role of the hidden layers.
( 2
min )
We propose a novel framework that combines deep generative time series models
with decision theory for generating personalized treatment strategies. It
leverages historical patient trajectory data to jointly learn the generation of
realistic personalized treatment and future outcome trajectories through deep
generative time series models. In particular, our framework enables the
generation of novel multivariate treatment strategies tailored to the
personalized patient history and trained for optimal expected future outcomes
based on conditional expected utility maximization. We demonstrate our
framework by generating personalized insulin treatment strategies and blood
glucose predictions for hospitalized diabetes patients, showcasing the
potential of our approach for generating improved personalized treatment
strategies. Keywords: deep generative model, probabilistic decision support,
personalized treatment generation, insulin and blood glucose prediction
( 2
min )
In this analysis, we use a K-nearest neighbors (KNN) model to conduct crop segmentation, and we compare these results with ground truth imagery on an agricultural region. Our results reveal that the classification from the KNN model is more accurately representative of the state of the current crop field in 2017 than the ground truth classification data from 2015. These results are a testament to the power of Planet’s high-cadence geospatial imagery. Agricultural fields change often, sometimes multiple times a season, and having high-frequency satellite imagery available to observe and analyze this land can provide immense value to our understanding of agricultural land and quickly-changing environments.
( 15
min )
In a talk, now available online, NVIDIA Chief Scientist Bill Dally describes a tectonic shift in how computer performance gets delivered in a post-Moore’s law era. Each new processor requires ingenuity and effort inventing and validating fresh ingredients, he said in a recent keynote address at Hot Chips, an annual gathering of chip and systems Read article >
( 6
min )
This post is co-written with Ilan Geller and Shuyu Yang from Accenture. Enterprises today face major challenges when it comes to using their information and knowledge bases for both internal and external business operations. With constantly evolving operations, processes, policies, and compliance requirements, it can be extremely difficult for employees and customers to stay up […]
( 8
min )
We’re excited to announce that Amazon SageMaker Canvas now offers a quicker and more user-friendly way to create machine learning models for time-series forecasting. SageMaker Canvas is a visual point-and-click service that enables business analysts to generate accurate machine learning (ML) models without requiring any machine learning experience or having to write a single line of code. SageMaker […]
( 7
min )
In the world of data-driven decision-making, time series forecasting is key in enabling businesses to use historical data patterns to anticipate future outcomes. Whether you are working in asset risk management, trading, weather prediction, energy demand forecasting, vital sign monitoring, or traffic analysis, the ability to forecast accurately is crucial for success. In these applications, […]
( 10
min )
In the rapidly evolving world of AI and machine learning (ML), foundation models (FMs) have shown tremendous potential for driving innovation and unlocking new use cases. However, as organizations increasingly harness the power of FMs, concerns surrounding data privacy, security, added cost, and compliance have become paramount. Regulated and compliance-oriented industries, such as financial services, […]
( 13
min )
Companies use time series forecasting to make core planning decisions that help them navigate through uncertain futures. This post is meant to address supply chain stakeholders, who share a common need of determining how many finished goods are needed over a mixed variety of planning time horizons. In addition to planning how many units of […]
( 11
min )
From startups to enterprises, organizations of all sizes are getting started with generative AI. They want to capitalize on generative AI and translate the momentum from betas, prototypes, and demos into real-world productivity gains and innovations. But what do organizations need to bring generative AI into the enterprise and make it real? When we talk […]
( 13
min )
The wait is over. GeForce NOW Ultimate members can experience Cyberpunk 2077: Phantom Liberty on GOG.com at full GeForce RTX 4080 quality, with support for NVIDIA DLSS 3.5 technology. It’s part of an action-packed GFN Thursday, with 26 more games joining the cloud gaming platform’s library, including Quake II from id Software. A New Look Read article >
( 8
min )
Powerful large-scale AI models like GPT-4 are showing dramatic improvements in reasoning, problem-solving, and language capabilities. This marks a phase change for artificial intelligence—and a signal of accelerating progress to come. In this Microsoft Research Podcast series, AI scientist and engineer Ashley Llorens hosts conversations with his collaborators and colleagues about what these models—and the […]
The post AI Frontiers: Measuring and mitigating harms with Hanna Wallach appeared first on Microsoft Research.
( 29
min )
The iconic sci-fi opera “VALIS,” first composed by Professor Tod Machover in 1987, reboots at MIT for a new generation.
( 11
min )
Inspired by physics, a new generative model PFGM++ outperforms diffusion models in image generation.
( 10
min )
The Amazon EU Design and Construction (Amazon D&C) team is the engineering team designing and constructing Amazon Warehouses across Europe and the MENA region. The design and deployment processes of projects involve many types of Requests for Information (RFIs) about engineering requirements regarding Amazon and project-specific guidelines. These requests range from simple retrieval of baseline […]
( 13
min )
MDaudit provides a cloud-based billing compliance and revenue integrity software as a service (SaaS) platform to more than 70,000 healthcare providers and 1,500 healthcare facilities, ensuring healthcare customers maintain regulatory compliance and retain revenue. Working with the top 60+ US healthcare networks, MDaudit needs to be able to scale its artificial intelligence (AI) capabilities to […]
( 5
min )
DENZA, the luxury electric-vehicle brand and joint venture between BYD and Mercedes-Benz, is debuting new intelligent driving features for its entire N7 model lineup, powered by the NVIDIA DRIVE Orin system-on-a-chip (SoC). The N7 series was introduced earlier this year as a family of spacious five-seater SUVs for commuters looking to sport a deluxe EV Read article >
( 5
min )
Medical-device company Invenio Imaging is developing technology that enables surgeons to evaluate tissue biopsies in the operating room, immediately after samples are collected — providing in just three minutes AI-accelerated insights that would otherwise take weeks to obtain from a pathology lab. In a surgical biopsy, a medical professional removes samples of cells or tissue Read article >
( 6
min )
As generative AI sweeps across corporate boardrooms around the world, global telecommunications companies are exploring how to cost-effectively deliver many of these new AI applications to the edge over 5G and upcoming 6G networks. Telcos plan to deploy over 17 million 5G microcells and towers worldwide by 2025. Building, managing and optimizing this new infrastructure Read article >
( 6
min )
Chunked prefills & decode-maximal batching boost LLM inference; DragNUWA combines text, image, and trajectory for fine-grained video content control; reconstructing images from human brain signals; structural inequalities in creator-audience relationships.
The post Research Focus: Week of September 25, 2023 appeared first on Microsoft Research.
( 9
min )
Talk about a Grand Slam. Denny’s CEO Kelli Valade was joined Tuesday by NVIDIA CEO Jensen Huang to unveil a plaque at the Silicon Valley Denny’s where NVIDIA’s founders hatched their idea for a chip that would enable realistic 3D graphics on personal computers. “This is a place where we fuel ideas. Your story is Read article >
( 6
min )
From gaming to creating to everyday productivity, NVIDIA RTX graphics cards feature specialized Tensor Cores that deliver cutting-edge performance and transformative capabilities for AI.
( 7
min )
As machine learning (ML) goes mainstream and gains wider adoption, ML-powered inference applications are becoming increasingly common to solve a range of complex business problems. The solution to these complex business problems often requires using multiple ML models and steps. This post shows you how to build and host an ML application with custom containers […]
( 13
min )
This post was co-authored with Daniele Chiappalupi, participant of the AWS student Hackathon team at ETH Zürich. Everyone can easily get started with machine learning (ML) using Amazon SageMaker JumpStart. In this post, we show you how a university Hackathon team used SageMaker JumpStart to quickly build an application that helps users identify and remove […]
( 9
min )
We’re at an exciting inflection point in the widespread adoption of machine learning (ML), and we believe most customer experiences and applications will be reinvented with generative AI. Generative AI can create new content and ideas, including conversations, stories, images, videos, and music. Like most AI, generative AI is powered by ML models—very large models […]
( 12
min )
E-commerce has improved technology and convenience for consumers globally. Fraud is a problem in e-commerce. Merchants and platforms fight fraud to protect their businesses and customers. Anomaly detection is a powerful tool for identifying irregular patterns and potential fraud. This article explores how anomaly detection is used in fraud detection for e-commerce and discusses different… Read More »In fraud detection for e-commerce: How does anomaly detection fit in and what are the key approaches?
The post In fraud detection for e-commerce: How does anomaly detection fit in and what are the key approaches? appeared first on Data Science Central.
( 22
min )
Thanks to the internet, you can now easily expand your reach and engage with diverse audiences wherever they are. However, this opportunity raises an important question: how can you localize your web content and maintain the security and privacy of sensitive data? This article comprehensively explores the best practices that will help you maintain data… Read More »The essential guide on data security and privacy in web localization
The post The essential guide on data security and privacy in web localization appeared first on Data Science Central.
( 22
min )
Microsoft researchers are introducing AutoGen, a framework for simplifying the orchestration, optimization, and automation of workflows for large language model (LLM) applications—potentially transforming and extending what LLMs can do.
The post AutoGen: Enabling next-generation large language model applications appeared first on Microsoft Research.
( 10
min )
No content preview
( 1
min )
ChatGPT, Bard, GPT-4, and the like are often pitched as ways to retrieve information. The problem is they'll "retrieve" whatever you ask for, whether or not it exists.
Tumblr user @indigofoxpaws sent me a few screenshots where they'd asked ChatGPT for an explanation of
( 3
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Adversarial examples, deliberately crafted using small perturbations to fool
deep neural networks, were first studied in image processing and more recently
in NLP. While approaches to detecting adversarial examples in NLP have largely
relied on search over input perturbations, image processing has seen a range of
techniques that aim to characterise adversarial subspaces over the learned
representations.
In this paper, we adapt two such approaches to NLP, one based on nearest
neighbors and influence functions and one on Mahalanobis distances. The former
in particular produces a state-of-the-art detector when compared against
several strong baselines; moreover, the novel use of influence functions
provides insight into how the nature of adversarial example subspaces in NLP
relate to those in image processing, and also how they differ depending on the
kind of NLP task.
( 2
min )
Rapid and accurate identification of Venous thromboembolism (VTE), a severe
cardiovascular condition including deep vein thrombosis (DVT) and pulmonary
embolism (PE), is important for effective treatment. Leveraging Natural
Language Processing (NLP) on radiology reports, automated methods have shown
promising advancements in identifying VTE events from retrospective data
cohorts or aiding clinical experts in identifying VTE events from radiology
reports. However, effectively training Deep Learning (DL) and the NLP models is
challenging due to limited labeled medical text data, the complexity and
heterogeneity of radiology reports, and data imbalance. This study proposes
novel method combinations of DL methods, along with data augmentation, adaptive
pre-trained NLP model selection, and a clinical expert NLP rule-based
classifier, to improve the accuracy of VTE identification in unstructured
(free-text) radiology reports. Our experimental results demonstrate the model's
efficacy, achieving an impressive 97\% accuracy and 97\% F1 score in predicting
DVT, and an outstanding 98.3\% accuracy and 98.4\% F1 score in predicting PE.
These findings emphasize the model's robustness and its potential to
significantly contribute to VTE research.
( 2
min )
Action scene understanding in soccer is a challenging task due to the complex
and dynamic nature of the game, as well as the interactions between players.
This article provides a comprehensive overview of this task divided into action
recognition, spotting, and spatio-temporal action localization, with a
particular emphasis on the modalities used and multimodal methods. We explore
the publicly available data sources and metrics used to evaluate models'
performance. The article reviews recent state-of-the-art methods that leverage
deep learning techniques and traditional methods. We focus on multimodal
methods, which integrate information from multiple sources, such as video and
audio data, and also those that represent one source in various ways. The
advantages and limitations of methods are discussed, along with their potential
for improving the accuracy and robustness of models. Finally, the article
highlights some of the open research questions and future directions in the
field of soccer action recognition, including the potential for multimodal
methods to advance this field. Overall, this survey provides a valuable
resource for researchers interested in the field of action scene understanding
in soccer.
( 2
min )
This paper presents a Hierarchical Reinforcement Learning methodology
tailored for optimizing CubeSat task scheduling in Low Earth Orbits (LEO).
Incorporating a high-level policy for global task distribution and a low-level
policy for real-time adaptations as a safety mechanism, our approach integrates
the Similarity Attention-based Encoder (SABE) for task prioritization and an
MLP estimator for energy consumption forecasting. Integrating this mechanism
creates a safe and fault-tolerant system for CubeSat task scheduling.
Simulation results validate the Hierarchical Reinforcement Learning superior
convergence and task success rate, outperforming both the MADDPG model and
traditional random scheduling across multiple CubeSat configurations.
( 2
min )
A common formulation of constrained reinforcement learning involves multiple
rewards that must individually accumulate to given thresholds. In this class of
problems, we show a simple example in which the desired optimal policy cannot
be induced by any weighted linear combination of rewards. Hence, there exist
constrained reinforcement learning problems for which neither regularized nor
classical primal-dual methods yield optimal policies. This work addresses this
shortcoming by augmenting the state with Lagrange multipliers and
reinterpreting primal-dual methods as the portion of the dynamics that drives
the multipliers evolution. This approach provides a systematic state
augmentation procedure that is guaranteed to solve reinforcement learning
problems with constraints. Thus, as we illustrate by an example, while previous
methods can fail at finding optimal policies, running the dual dynamics while
executing the augmented policy yields an algorithm that provably samples
actions from the optimal policy.
( 2
min )
Transformer has been considered the dominating neural architecture in NLP and
CV, mostly under supervised settings. Recently, a similar surge of using
Transformers has appeared in the domain of reinforcement learning (RL), but it
is faced with unique design choices and challenges brought by the nature of RL.
However, the evolution of Transformers in RL has not yet been well unraveled.
In this paper, we seek to systematically review motivations and progress on
using Transformers in RL, provide a taxonomy on existing works, discuss each
sub-field, and summarize future prospects.
( 2
min )
We formulate a data independent latent space regularisation constraint for
general unsupervised autoencoders. The regularisation rests on sampling the
autoencoder Jacobian in Legendre nodes, being the centre of the Gauss-Legendre
quadrature. Revisiting this classic enables to prove that regularised
autoencoders ensure a one-to-one re-embedding of the initial data manifold to
its latent representation. Demonstrations show that prior proposed
regularisation strategies, such as contractive autoencoding, cause topological
defects already for simple examples, and so do convolutional based
(variational) autoencoders. In contrast, topological preservation is ensured
already by standard multilayer perceptron neural networks when being
regularised due to our contribution. This observation extends through the
classic FashionMNIST dataset up to real world encoding problems for MRI brain
scans, suggesting that, across disciplines, reliable low dimensional
representations of complex high-dimensional datasets can be delivered due to
this regularisation technique.
( 2
min )
Indoor localization is getting increasing demands for various cutting-edged
technologies, like Virtual/Augmented reality and smart home. Traditional
model-based localization suffers from significant computational overhead, so
fingerprint localization is getting increasing attention, which needs lower
computation cost after the fingerprint database is built. However, the accuracy
of indoor localization is limited by the complicated indoor environment which
brings the multipath signal refraction. In this paper, we provided a scheme to
improve the accuracy of indoor fingerprint localization from the frequency
domain by predicting the channel state information (CSI) values from another
transmitting channel and spliced the multi-band information together to get
more precise localization results. We tested our proposed scheme on COST 2100
simulation data and real time orthogonal frequency division multiplexing (OFDM)
WiFi data collected from an office scenario.
( 2
min )
Consider an online convex optimization problem where the loss functions are
self-concordant barriers, smooth relative to a convex function $h$, and
possibly non-Lipschitz. We analyze the regret of online mirror descent with
$h$. Then, based on the result, we prove the following in a unified manner.
Denote by $T$ the time horizon and $d$ the parameter dimension. 1. For online
portfolio selection, the regret of $\widetilde{\text{EG}}$, a variant of
exponentiated gradient due to Helmbold et al., is $\tilde{O} ( T^{2/3} d^{1/3}
)$ when $T > 4 d / \log d$. This improves on the original $\tilde{O} ( T^{3/4}
d^{1/2} )$ regret bound for $\widetilde{\text{EG}}$. 2. For online portfolio
selection, the regret of online mirror descent with the logarithmic barrier is
$\tilde{O}(\sqrt{T d})$. The regret bound is the same as that of Soft-Bayes due
to Orseau et al. up to logarithmic terms. 3. For online learning quantum states
with the logarithmic loss, the regret of online mirror descent with the
log-determinant function is also $\tilde{O} ( \sqrt{T d} )$. Its per-iteration
time is shorter than all existing algorithms we know.
( 3
min )
We study the problem of in-context learning (ICL) with large language models
(LLMs) on private datasets. This scenario poses privacy risks, as LLMs may leak
or regurgitate the private examples demonstrated in the prompt. We propose a
novel algorithm that generates synthetic few-shot demonstrations from the
private dataset with formal differential privacy (DP) guarantees, and show
empirically that it can achieve effective ICL. We conduct extensive experiments
on standard benchmarks and compare our algorithm with non-private ICL and
zero-shot solutions. Our results demonstrate that our algorithm can achieve
competitive performance with strong privacy levels. These results open up new
possibilities for ICL with privacy protection for a broad range of
applications.
( 2
min )
Initialization of neural network weights plays a pivotal role in determining
their performance. Feature Imitating Networks (FINs) offer a novel strategy by
initializing weights to approximate specific closed-form statistical features,
setting a promising foundation for deep learning architectures. While the
applicability of FINs has been chiefly tested in biomedical domains, this study
extends its exploration into other time series datasets. Three different
experiments are conducted in this study to test the applicability of imitating
Tsallis entropy for performance enhancement: Bitcoin price prediction, speech
emotion recognition, and chronic neck pain detection. For the Bitcoin price
prediction, models embedded with FINs reduced the root mean square error by
around 1000 compared to the baseline. In the speech emotion recognition task,
the FIN-augmented model increased classification accuracy by over 3 percent.
Lastly, in the CNP detection experiment, an improvement of about 7 percent was
observed compared to established classifiers. These findings validate the broad
utility and potency of FINs in diverse applications.
( 2
min )
The decision-making process in real-world implementations has been affected
by a growing reliance on data-driven models. We investigated the synergetic
pattern between the data-driven methods, empirical domain knowledge, and
first-principles simulations. We showed the potential risk of biased results
when using data-driven models without causal analysis. Using a case study
assessing the implication of several design solutions on the energy consumption
of a building, we proved the necessity of causal analysis during the
data-driven modeling process. We concluded that: (a) Data-driven models'
accuracy assessment or domain knowledge screening may not rule out biased and
spurious results; (b) Data-driven models' feature selection should involve
careful consideration of causal relationships, especially colliders; (c) Causal
analysis results can be used as an aid to first-principles simulation design
and parameter checking to avoid cognitive biases. We proved the benefits of
causal analysis when applied to data-driven models in building engineering.
( 2
min )
This work describes the TrueLearn Python library, which contains a family of
online learning Bayesian models for building educational (or more generally,
informational) recommendation systems. This family of models was designed
following the "open learner" concept, using humanly-intuitive user
representations. For the sake of interpretability and putting the user in
control, the TrueLearn library also contains different representations to help
end-users visualise the learner models, which may in the future facilitate user
interaction with their own models. Together with the library, we include a
previously publicly released implicit feedback educational dataset with
evaluation metrics to measure the performance of the models. The extensive
documentation and coding examples make the library highly accessible to both
machine learning developers and educational data mining and learning analytic
practitioners. The library and the support documentation with examples are
available at https://truelearn.readthedocs.io/en/latest.
( 2
min )
Efficient training of large-scale graph neural networks (GNNs) has been
studied with a specific focus on reducing their memory consumption. Work by Liu
et al. (2022) proposed extreme activation compression (EXACT) which
demonstrated drastic reduction in memory consumption by performing quantization
of the intermediate activation maps down to using INT2 precision. They showed
little to no reduction in performance while achieving large reductions in GPU
memory consumption. In this work, we present an improvement to the EXACT
strategy by using block-wise quantization of the intermediate activation maps.
We experimentally analyze different block sizes and show further reduction in
memory consumption (>15%), and runtime speedup per epoch (about 5%) even when
performing extreme extents of quantization with similar performance trade-offs
as with the original EXACT. Further, we present a correction to the assumptions
on the distribution of intermediate activation maps in EXACT (assumed to be
uniform) and show improved variance estimations of the quantization and
dequantization steps.
( 2
min )
In this paper, we study the effect of popularity degradation bias in the
context of local music recommendations. Specifically, we examine how accurate
two top-performing recommendation algorithms, Weight Relevance Matrix
Factorization (WRMF) and Multinomial Variational Autoencoder (Mult-VAE), are at
recommending artists as a function of artist popularity. We find that both
algorithms improve recommendation performance for more popular artists and, as
such, exhibit popularity degradation bias. While both algorithms produce a
similar level of performance for more popular artists, Mult-VAE shows better
relative performance for less popular artists. This suggests that this
algorithm should be preferred for local (long-tail) music artist
recommendation.
( 2
min )
Social science often relies on surveys of households and individuals. Dozens
of such surveys are regularly administered by the U.S. government. However,
they field independent, unconnected samples with specialized questions,
limiting research questions to those that can be answered by a single survey.
The fusionACS project seeks to integrate data from multiple U.S. household
surveys by statistically "fusing" variables from "donor" surveys onto American
Community Survey (ACS) microdata. This results in an integrated microdataset of
household attributes and well-being dimensions that can be analyzed to address
research questions in ways that are not currently possible. The presented data
comprise the fusion onto the ACS of select donor variables from the Residential
Energy Consumption Survey (RECS) of 2015, the National Household Transportation
Survey (NHTS) of 2017, the American Housing Survey (AHS) of 2019, and the
Consumer Expenditure Survey - Interview (CEI) for the years 2015-2019. The
underlying statistical techniques are included in an open-source $R$ package,
fusionModel, that provides generic tools for the creation, analysis, and
validation of fused microdata.
( 2
min )
Efficient training of large-scale graph neural networks (GNNs) has been
studied with a specific focus on reducing their memory consumption. Work by Liu
et al. (2022) proposed extreme activation compression (EXACT) which
demonstrated drastic reduction in memory consumption by performing quantization
of the intermediate activation maps down to using INT2 precision. They showed
little to no reduction in performance while achieving large reductions in GPU
memory consumption. In this work, we present an improvement to the EXACT
strategy by using block-wise quantization of the intermediate activation maps.
We experimentally analyze different block sizes and show further reduction in
memory consumption (>15%), and runtime speedup per epoch (about 5%) even when
performing extreme extents of quantization with similar performance trade-offs
as with the original EXACT. Further, we present a correction to the assumptions
on the distribution of intermediate activation maps in EXACT (assumed to be
uniform) and show improved variance estimations of the quantization and
dequantization steps.
( 2
min )
Simple regret minimization is a critical problem in learning optimal
treatment assignment policies across various domains, including healthcare and
e-commerce. However, it remains understudied in the contextual bandit setting.
We propose a new family of computationally efficient bandit algorithms for the
stochastic contextual bandit settings, with the flexibility to be adapted for
cumulative regret minimization (with near-optimal minimax guarantees) and
simple regret minimization (with SOTA guarantees). Furthermore, our algorithms
adapt to model misspecification and extend to the continuous arm settings.
These advantages come from constructing and relying on "conformal arm sets"
(CASs), which provide a set of arms at every context that encompass the
context-specific optimal arm with some probability across the context
distribution. Our positive results on simple and cumulative regret guarantees
are contrasted by a negative result, which shows that an algorithm can't
achieve instance-dependent simple regret guarantees while simultaneously
achieving minimax optimal cumulative regret guarantees.
( 2
min )
Initialization of neural network weights plays a pivotal role in determining
their performance. Feature Imitating Networks (FINs) offer a novel strategy by
initializing weights to approximate specific closed-form statistical features,
setting a promising foundation for deep learning architectures. While the
applicability of FINs has been chiefly tested in biomedical domains, this study
extends its exploration into other time series datasets. Three different
experiments are conducted in this study to test the applicability of imitating
Tsallis entropy for performance enhancement: Bitcoin price prediction, speech
emotion recognition, and chronic neck pain detection. For the Bitcoin price
prediction, models embedded with FINs reduced the root mean square error by
around 1000 compared to the baseline. In the speech emotion recognition task,
the FIN-augmented model increased classification accuracy by over 3 percent.
Lastly, in the CNP detection experiment, an improvement of about 7 percent was
observed compared to established classifiers. These findings validate the broad
utility and potency of FINs in diverse applications.
( 2
min )
Discover the obstacles hindering seamless AI adoption in financial services and gain actionable insights to navigate regulatory compliance, data security, organizational change, and more.
The post AI in finance: Addressing hurdles on the path to transformation appeared first on Data Science Central.
( 22
min )
Posted by Cheng-Yu Hsieh, Student Researcher, and Chen-Yu Lee, Research Scientist, Cloud AI Team
Large language models (LLMs) have enabled a new data-efficient learning paradigm wherein they can be used to solve unseen new tasks via zero-shot or few-shot prompting. However, LLMs are challenging to deploy for real-world applications due to their sheer size. For instance, serving a single 175 billion LLM requires at least 350GB of GPU memory using specialized infrastructure, not to mention that today's state-of-the-art LLMs are composed of over 500 billion parameters. Such computational requirements are inaccessible for many research teams, especially for applications that require low latency performance.
To circumvent these deployment challenges, practitioners often choose to deplo…
( 93
min )
In this post, we discuss how United Airlines, in collaboration with the Amazon Machine Learning Solutions Lab, build an active learning framework on AWS to automate the processing of passenger documents. “In order to deliver the best flying experience for our passengers and make our internal business process as efficient as possible, we have developed […]
( 10
min )
To add to our guidance for optimizing deep learning workloads for sustainability on AWS, this post provides recommendations that are specific to generative AI workloads. In particular, we provide practical best practices for different customization scenarios, including training models from scratch, fine-tuning with additional data using full or parameter-efficient techniques, Retrieval Augmented Generation (RAG), and prompt engineering.
( 10
min )
The NVIDIA Studio laptop lineup is expanding with the new Microsoft Surface Laptop Studio 2, powered by GeForce RTX 4060, GeForce RTX 4050 or NVIDIA RTX 2000 Ada Generation Laptop GPUs, providing powerful performance and versatility for creators.
( 8
min )
Gone are the days when AI was the domain of sprawling data centers or elite researchers. For GeForce RTX users, AI is now running on your PC. It’s personal, enhancing every keystroke, every frame and every moment. Gamers are already enjoying the benefits of AI in over 300 RTX games. Meanwhile, content creators have access Read article >
( 8
min )
For seasoned 3D artists and budding digital creation enthusiasts alike, an alpha version of the popular 3D software Blender is elevating creative journeys.
( 7
min )
NVIDIA founder and CEO Jensen Huang will highlight the newest in generative AI and cloud computing at the NVIDIA AI Summit in Tel Aviv from Oct. 15-16. The two-day summit is set to attract more than 2,500 developers, researchers and decision-makers from across one of the world’s most vibrant technology hubs. With over 6,000 startups, Read article >
( 5
min )
Time to get the gang back together — PAYDAY 3 streams on GeForce NOW this week. It’s one of 11 titles joining the cloud this week, including Party Animals. The Perfect Heist PAYDAY 3 is the highly anticipated sequel to one of the world’s most popular co-op shooters. Step out of retirement and back into Read article >
( 5
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )